Patent Classification Experiments with the Linguistic Classification System LCS
نویسندگان
چکیده
In the context of the CLEF-IP 2010 classification task, we conducted a series of experiments with the Linguistic Classification System (LCS). We compared two document representations for patent abstracts: a bag-of-words representation and a syntactic/semantic representation containing both words and dependency triples. We evaluated two types of output: using a fixed cut-off on the ranking of the classes and using a flexible cut-off based on a threshold on the classification scores. Using the Winnow classifier, we obtained an improvement in classification scores when triples are added to the bag of words. However, our results are remarkably better on a held-out subset of the target data than on the 2 000-topic test set. The main findings of this paper are: (1) adding dependency triples to words has a positive effect on classification accuracy and (2) selecting classes by using a threshold on the classification scores instead of returning a fixed number of classes per document improves classification scores while at the same time it lowers the number of classes needs to be judged manually by the professionals at the patent office.
منابع مشابه
Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کاملThyroid disorder diagnosis based on Mamdani fuzzy inference system classifier
Introduction: Classification and prediction are two most important applications of statistical methods in the field of medicine. According to this note that the classical classification are provided due to the clinical symptom and do not involve the use of specialized information and knowledge. Therefore, using a classifier that can combine all this information, is necessary. The aim of this s...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملA Hybrid System of Deep Learning and Learning Classifier System for Database Intrusion Detection
Nowadays, as most of the companies and organizations rely on the database to safeguard sensitive data, it is required to guarantee the strong protection of the data. Intrusion detection system (IDS) can be an important component of the strong security framework, and the machine learning approach with adaptation capability has a great advantage for this system. In this paper, we propose a hybrid...
متن کاملAutomatic thematic classification of election manifestos
I We aim to develop a classifier which assigns themes to unseen Dutch election manifestos written after Lipschits’ work I We have to rely on the older data from the eighties and nineties for training and optimization of the classifier I System was tuned by testing on 1998 data, while using older data as training material I Balanced Winnow, implementation in the Linguistic Classification System ...
متن کامل